How Useful Are Non-Blocking Loads, Stream Buffers and Speculative Execution in Multiple Issue Processors?
نویسندگان
چکیده
We investigate the relative performance impact of nonblocking loads, stream buffers, and specula five execution both used individually and in conjunction with each other. We have simulated the SPEC92 benchmarks on a statically scheduled quad-issue processor model. running code from the Multijlow compiler. Non-blocking loads and stream buffers both provide a signif cant performance advantage, and their combination performs significantly better than either alone. For example, with a 64-byte, 2-way set associative cache with 32 cycle fetch latency, non-blocking loads reduce the run-time by 21% while stream-buffers reduce it by 26%, and the combined use of the two yields a 47% reduction. The addition of speculative execution further improves the performance of the systems that we have simulated. with or without non-blocking loads and stream buffers, by an additional 20% to 40%. We expect that the use of all three of these techniques will be important in future generations of microprocessors.
منابع مشابه
Franklin and Sohi : Arb - a Hardware Mechanism for Dynamic Reordering of Memory
To exploit instruction level parallelism, it is important not only to execute multiple memory references per cycle, but also to reorder memory references-especially to execute loads before stores that precede them in the sequential instruction stream. To guarantee correctness of execution in such situations, memory reference addresses have to be disambiguated. This paper presents a novel hardwa...
متن کاملA comparison of three architectures: Superscalar, Simultaneous Multithreading CPUs and Single-Chip Multiprocessor
Recent years have seen a great deal of interest in multiple-issue machines or superscalar processors, processors that can issue several mutually independent instructions in the same cycle. These machines exploit the parallelism that programs exhibit at the instruction level. The superscalar processor designs dynamically extract parallelism by executing many instructions within a single, sequent...
متن کاملRICE UNIVERSITY An Evaluation of Memory Consistency Models for Shared Memory Systems with ILP Processors by
The memory consistency model of a shared memory multiprocessor determines the extent to which memory operations may be overlapped or reordered for better perfor mance Studies on previous generation shared memory multiprocessors have shown that relaxed memory consistency models like release consistency RC can signif icantly outperform the conceptually simpler model of sequential consistency SC C...
متن کاملOn Effective Data Supply For Multi-Issue Processors
Emerging multi-issue microprocessors require effective data supply to sustain multiple instruction processing. The data cache structure, the backbone of data supply, has been organized and managed as one large homogenous resource, offering little flexibility for selective caching. While memory latency hiding techniques and multi-ported caches are critical to effective data supply, we show in th...
متن کاملThe Effect of Speculative Execution on Cache Performance
Superscalar microprocessors obtain high performance by exploiting parallelism at the instruction level. To effectively use the instruction-level parallelism found in general purpose, non-numeric code, future processors will need to speculatively execute far beyond instruction fetch limiting conditional branches. One result of this deep speculation is an increase in the number of instruction and...
متن کامل